Transform Rule Implementation

Transform rules are a special type of rule whose purpose is to “transform” incoming data from a specific field into other data. This transformation can range from simply dividing all incoming numerical data by a factor 1000 (rule:apply_static_transform) or adding a prefix to incoming strings (rule:add_prefix_suffix); to mapping incoming values to entirely different values (rule:map_values) or retrieving specific portions of incoming strings with Regex (rule:apply_regex). Transform rules can be chained together to create transformations that consist of multiple steps. So, for example, if one has the string 'WEATHER_STATION_XXX_KNMI' with 'XXX' being an ID, and wishes for this to be transformed to 'KNMI_XXX', then this result can be achieved using a combination of apply_regex (for retrieving the ID) and add_prefix_suffix (for adding the 'KNMI_' prefix).

As transform rules are applied on incoming data, they are the only type of rule that can be used within Transformation Configurations. While transform rules can be used within flow designs, it is very uncommon for them to be, due to transform rules being required to inherit a different base class than rules of other rule types (AbstractTransformRule instead of AbstractRule). This separate base class lacks many of the methods that AbstractRule has, making it hard to use within flows.

An example of a simple transform rule can be found below (add_prefix_suffix):

    from ewx_public.transform_rule import TransformRule


    class AddPrefixSuffix(TransformRule):
        def apply(self, string_value, prefix_suffix_flag, **kwargs):
            """
            Rule that adds a specified `string_value` as either a prefix or suffix to the channel data.
    
            Args:
                string_value (str): String to add.
                prefix_suffix_flag (str): Add `string_value` as `prefix` or `suffix`.
    
            Returns:
                pandas.Series
    
            """
    
            # Add prefix or suffix
            index = self.dataframe.dropna().index
            if prefix_suffix_flag == 'prefix':
                result = self.dataframe[self.dataframe.columns[0]][index].apply(lambda x: string_value + str(x))
            else:
                result = self.dataframe[self.dataframe.columns[0]][index].apply(lambda x: str(x) + string_value)
    
            # Return it
            self.dataframe[self.dataframe.columns[0]][index] = result
            return self.dataframe

Legacy imports

The legacy import (from energyworx_public.rule import AbstractTransformRule) is still supported on the platform. See ewx-public Package for the full migration guide.

There are several things to note here: A transform rule inherits TransformRule (previously AbstractTransformRule), not FlowRule. Because of this, it uses a different layout than normal rules do (as explained in the [Rule Implementation](./0- Rule%20Implementation.md). For example, transform rules do not have a prepare_context as they cannot load any datasources or timeseries data. Nor does it have access to most methods a normal rule has, as those are unique to a flow and cannot be used in a transformation configuration. Instead, a transform rule has access to two important attributes:

self.dataframe: A Pandas dataframe that contains the columns with the data that needs transforming. This data is always of type string no matter what the data represents, as it came directly from the transformation configuration.
self.datasources: Contains a dict of all datasources that have been ingested and that this transform rule should keep in mind.

The result of a transform rule is either a pandas Series or a DataFrame (and not an instance of RuleResult, like with normal rules) containing the transformed data. Which one to use depends on whether you wish to be able to chain this transform rule to another transform rule. If you return a pandas Series, then it will not be possible to use a transform rule on the result this transform rule returned. If you return a pandas DataFrame, then a transform rule can be used on the result. The reason to do one or the other depends on the transform rule itself: If you are writing a transform rule like the one above (adding a prefix/suffix), it would not be strange that you would like to perform additional transformations on the data. In this case, it is better to return a dataframe as it makes your transform rules more independent and allows for transform rule chaining. However, if we would write a transform rule that is very specific to a client's use case and one should never attempt to transform the data any further because of that, then it is better to return a series.